Memory Fetch Granule

ABSTRACT

Systems, apparatuses, and methods for implementing a memory fetch granule for real-time agents are described. A computing system includes a plurality of real-time agents coupled to memory via an interconnect fabric and a memory controller. The efficiency of the memory controller is determined by the number of bank groups in the memory devices coupled to the memory controller. A memory fetch granule is defined for the memory controller based on the amount of data that can be accessed in parallel on the memory device in back-to-back access cycles. Each real-time agent accumulates memory requests for sequential physical addresses until the amount of data referenced by the requests reaches the size of the memory fetch granule. Once the memory fetch granule is reached, the real-time agent sends the requests to the memory controller via the fabric. This helps to ensure that the requests will arrive at the memory controller near enough to each other to get grouped together.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.17/230,490, entitled “Memory Fetch Granule”, filed Apr. 14, 2021, theentirety of which is incorporated herein by reference.

BACKGROUND Technical Field

Embodiments described herein relate to digital systems and, moreparticularly, to techniques for memory controllers to efficiently usebandwidth.

Description of the Related Art

Quality of Service (QoS) is a policy that enables a computing system toprovide required performance for all agents in the system. The policycan be defined as a set of requirements that each agent, fabriccomponent, and memory subsystem component should follow to ensure theproper memory service level of each agent in the system. As used herein,the term “agent” is defined as a component generating traffic within acomputing system. Non-limiting examples of agents include a centralprocessing unit (CPU), graphics processing unit (GPU), display, and soon. QoS is a set of mechanisms for guaranteeing fabric, memorycontroller, and memory bandwidth over a bounded and pre-defined periodof time.

The QoS goal is provide every agent in the system the appropriateservice from the memory subsystem. This service includes at leastproviding sufficient bandwidth in a given time window, low latencyaccess to memory, maintaining memory ordering rules, and preventing headof line blocking. Real-time (RT) traffic has a stringent requirementwhere required amount of bandwidth should be guaranteed with boundedlatency to prevent functional failures such as display underrun. Inorder to satisfy the bandwidth requirements of all the RT agents, it isimportant that the memory controllers efficiently use bandwidth.

Various challenges can be encountered when trying to ensure that thememory controllers are efficiently using the available memory bandwidth.For example, some computing systems have a distributed memory subsystemwith multiple RT agents and multiple memory controllers without a singlepoint of arbitration and decision making. In such a system, lowerpriority RT agents can interfere with a higher priority RT agent. Also,multiple RT streams can access the same memory channel creating head ofline blocking and effectively reducing the available RT memory bandwidthfor a given stream. Additionally, an inefficient memory access patternto the memory channels can significantly reduce the channel throughput.

SUMMARY

Systems, apparatuses, and methods for implementing a memory fetchgranule are contemplated.

In one embodiment, a variety of different type of traffic traverses acommunication fabric connecting together a plurality of agents and oneor more endpoints (e.g., memory device). The different types of trafficmay include real-time (RT) traffic. In one embodiment, satisfying RTguarantees in the system requires a known worst case amount of memorybandwidth from each memory device. This worst case memory bandwidth isdetermined by how evenly data is distributed over the banks and bankgroups of a memory device (e.g., dynamic random-access memory (DRAM)device) in the most extreme conditions. In order to guarantee sufficientRT bandwidth from a DRAM channel, RT agents shall access data at agranularity larger than a cache line. This data access granularity isreferred to herein as a “memory fetch granule”.

In one embodiment, the memory controller notifies the RT agents of asize of the memory fetch granule. In another embodiment, the size of thememory fetch granule is programmed into the RT agents by software. Inother embodiments, other techniques for informing the RT agents of thememory fetch granule size can be employed. The RT agents then attempt togroup their memory requests together into memory fetch granules prior tosending the memory requests to memory. At arbitration points in thecommunication fabric, the memory fetch granule is arbitrated as a groupto keep the request together. This helps to ensure that the requestswill arrive at the memory controller near enough to each other to getgrouped together.

These and other embodiments will be further appreciated upon referenceto the following description and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and further advantages of the methods and mechanisms may bebetter understood by referring to the following description inconjunction with the accompanying drawings, in which:

FIG. 1 is a generalized block diagram of one embodiment of a SOC.

FIG. 2 is a generalized block diagram illustrating one embodiment of asystem with multiple memory controllers.

FIG. 3 is a block diagram of one embodiment of a memory system.

FIG. 4 is a block diagram of one embodiment of an arbiter.

FIG. 5 is a flow diagram of one embodiment of a method for implementinga memory fetch granule.

FIG. 6 is a flow diagram of one embodiment of a method for a real-timeagent operating under the constraints of a memory fetch granule.

FIG. 7 is a flow diagram of one embodiment of a method for managingmemory fetch granules for multiple memory controllers.

FIG. 8 is a flow diagram of one embodiment of a method for managingmemory fetch granules for an arbiter forwarding a MFG.

FIG. 9 is a flow diagram of one embodiment of a method for an agentenqueuing or overriding enqueuing of requests.

FIG. 10 is a block diagram of one embodiment of a system.

While the embodiments described in this disclosure may be susceptible tovarious modifications and alternative forms, specific embodimentsthereof are shown by way of example in the drawings and will herein bedescribed in detail. It should be understood, however, that the drawingsand detailed description thereto are not intended to limit theembodiments to the particular form disclosed, but on the contrary, theintention is to cover all modifications, equivalents and alternativesfalling within the spirit and scope of the appended claims.

DETAILED DESCRIPTION OF EMBODIMENTS

In the following description, numerous specific details are set forth toprovide a thorough understanding of the embodiments described in thisdisclosure. However, one having ordinary skill in the art shouldrecognize that the embodiments might be practiced without these specificdetails. In some instances, well-known circuits, structures, andtechniques have not been shown in detail for ease of illustration and toavoid obscuring the description of the embodiments.

Referring now to FIG. 1 , a block diagram of one embodiment of asystem-on-a-chip (SOC) is shown. SOC 100 is shown coupled to a memory135. As implied by the name, the components of the SOC 100 may beintegrated onto a single semiconductor substrate as an integratedcircuit “chip”. It is noted that a “chip” may also be referred to as a“die”. In some embodiments, the components may be implemented on two ormore discrete chips in a system. However, the SOC 100 will be used as anexample herein. In the illustrated embodiment, the components of the SOC100 include a central processing unit (CPU) complex 120, on-chipperipheral components 140A-140N (more briefly, “peripherals”), a memorycontroller (MC) 130, and a communication fabric 110. The components 120,130, 140A-140N, and 150 may all be coupled to the communication fabric110. The memory controller 130 may be coupled to the memory 135 duringuse, and the peripheral 140B may be coupled to an external interface 160during use. In the illustrated embodiment, the CPU complex 120 includesone or more processors (P) 124 and a level two (L2) cache 122.

The peripherals 140A-140N may be any set of additional hardwarefunctionality included in the SOC 100. For example, the peripherals140A-140N may include video peripherals such as an image signalprocessor configured to process image capture data from a camera orother image sensor, display controllers configured to display video dataon one or more display devices, graphics processing units (GPUs), videoencoder/decoders, scalers, rotators, blenders, etc. The peripherals mayinclude audio peripherals such as microphones, speakers, interfaces tomicrophones and speakers, audio processors, digital signal processors,mixers, etc. The peripherals may include peripheral interfacecontrollers for various interfaces 160 external to the SOC 100 (e.g. theperipheral 140B) including interfaces such as Universal Serial Bus(USB), peripheral component interconnect (PCI) including PCI Express(PCIe), serial and parallel ports, etc. The peripherals may includenetworking peripherals such as media access controllers (MACs). Any setof hardware may be included.

In one embodiment, SOC 100 includes CPU complex 120. The CPU complex 120may include one or more CPU processors 124 that serve as the CPU of theSOC 100. The CPU of the system includes the processor(s) that executethe main control software of the system, such as an operating system.Generally, software executed by the CPU during use may control the othercomponents of the system to realize the desired functionality of thesystem. The processors 124 may also execute other software, such asapplication programs. The application programs may provide userfunctionality, and may rely on the operating system for lower leveldevice control. Accordingly, the processors 124 may also be referred toas application processors.

The CPU complex 120 may further include other hardware such as the L2cache 122 and/or an interface to the other components of the system(e.g., an interface to the communication fabric 110). Generally, aprocessor may include any circuitry and/or microcode configured toexecute instructions defined in an instruction set architectureimplemented by the processor. The instructions and data operated on bythe processors in response to executing the instructions may generallybe stored in the memory 135, although certain instructions may bedefined for direct processor access to peripherals as well. Processorsmay encompass processor cores implemented on an integrated circuit withother components as a system on a chip or other levels of integration.Processors may further encompass discrete microprocessors, processorcores, and/or microprocessors integrated into multichip moduleimplementations, processors implemented as multiple integrated circuits,and so on.

The memory controller 130 may generally include the circuitry forreceiving memory operations from the other components of the SOC 100 andfor accessing the memory 135 to complete the memory operations. Thememory controller 130 may be configured to access any type of memory135. For example, the memory 135 may be static random access memory(SRAM), dynamic RAM (DRAM) such as synchronous DRAM (SDRAM) includingdouble data rate (DDR, DDR2, DDR3, etc.) DRAM. Low power/mobile versionsof the DDR DRAM may be supported (e.g. LPDDR, mDDR, etc.). The memorycontroller 130 may include queues for memory operations, for ordering(and potentially reordering) the operations and presenting theoperations to the memory 135. The memory controller 130 may furtherinclude data buffers to store write data awaiting write to memory andread data awaiting return to the source of the memory operation.

The efficiency of memory controller 130 is determined by the number ofbank groups in the memory devices which make up memory 135. Accordingly,in one embodiment, memory controller 130 defines a memory fetch granulefor use by real-time agents when generating and sending memory requestswhich target memory 135 to memory controller 130. A memory fetch granuleis defined based on the amount of data that can be accessed in parallelon in memory 135 in back-to-back access cycles. Each real-time agentaccumulates memory requests for sequential physical addresses until theamount of data referenced by the requests reaches the size of the memoryfetch granule (unless the agent otherwise indicates that it has reachedthe end of a memory fetch granule, as discussed below). Once the size ofthe memory fetch granule is reached, the real-time agent sends therequests to the memory controller 130 via the fabric 110. This helps toensure a high memory bandwidth efficiency is maintained by memorycontroller 130.

The communication fabric 110 may be any communication interconnect andprotocol for communicating among the components of the SOC 100. Thecommunication fabric 110 may be bus-based, including shared busconfigurations, cross bar configurations, and hierarchical buses withbridges. The communication fabric 110 may also be packet-based, and maybe hierarchical with bridges, cross bar, point-to-point, or otherinterconnects. It is noted that the number of components of the SOC 100(and the number of subcomponents for those shown in FIG. 1 , such aswithin the CPU complex 120) may vary from embodiment to embodiment.There may be more or fewer of each component/subcomponent than thenumber shown in FIG. 1 .

Turning to FIG. 2 , an embodiment of a block diagram of a system 200with multiple memory controllers 240A-N is illustrated. As shown, system200 includes at least agents 210A-N, communication fabric 230, memorycontrollers 240A-N, and memory devices 250A-N. System 200 may includeany number of other components which are not shown to avoid obscuringthe figure. Agents 210A-N are representative of any number and type ofreal-time agents generating real-time traffic. Each agent 210A-N cangenerate and/or send a plurality of different data streams targetingdifferent memory controllers 240A-N. While only agent 210A is shown inan expanded form, it is noted that agent 210N, and any number of otheragents, may include similar components and a similar structure to agent210A. Agent 210A includes at least control unit 215, queues 220A-B, andmemory fetch granule (MFG) table 225. Agent 210A can also include othercomponents such as a processor, cache, and/or other circuitry. Controlunit 215 may be implemented using any combination of circuitry and/orprogram instructions. It is noted that control unit 215 may also bereferred to as an arbiter.

In one embodiment, agent 210A groups together real-time memory requestsin memory fetch granule sized chunks, with the size of the memory fetchgranule varying according to the memory controller 240A-N targeted bythe requests. For example, agent 210A attempts to accumulate a number ofreal-time memory requests targeting memory coupled to a given memorycontroller 240A-N based on the memory fetch granule corresponding tothat given memory controller 240A-N. This accumulation of memoryrequests is performed prior to sending the memory requests to the givenmemory controller via communication fabric 230. It is noted that in somecases, an agent will not have enough data for a full memory fetchgranule and instead will send a smaller data chunk. In these cases, theagent will include an indication that the smaller data chunk should beforwarded and processed as is. Any type of indication can be used toindicate that less than a memory fetch granule is being sent, with thetype of indication varying according to the embodiment.

While memory requests are being generated and/or received from anothercomponent, the requests are buffered in queues 220A-B which arerepresentative of any number of queues. Control unit 215 queries MFGtable 225 to determine the memory fetch granule for a given memorycontroller 240A-N. Each memory controller 240A-N may have a differentmemory fetch granule in one embodiment. In one embodiment, the memoryfetch granule is based on the specific technology of the memorydevice(s) coupled to the memory controller. For example, the memoryfetch granule is calculated based on the amount of data that can beaccessed in parallel in a single access cycle. This parallel accesscapability is determined by the number of memory banks, the size of eachmemory bank, and how many memory banks can be accessed simultaneously.As the memory devices coupled to the different memory controller 240A-Nmay vary, so too may the memory fetch granule vary from memorycontroller to memory controller. As shown in MFG table 225, memorycontroller 240A has a memory fetch granule size of 1 kilobyte (KB),memory controller 240B has a memory fetch granule size of 4 KB, andmemory controller 240N has a memory fetch granule size of 256 bytes (B).These memory fetch granule sizes are merely indicative of one particularembodiment. In some embodiments, the memory fetch granule sizes may bethe same for two or more of memory controllers 240A-N.

In one embodiment, communication fabric 230 includes arbiters 235A-Nwhich are representative of any number of arbiters. It is noted thatarbiters 235A-N may also be referred to as switches 235A-N. Requestsgenerated by agents 210A-N may traverse any number of arbiters 235A-N ontheir way to the target memory controller 240A-N. Each arbiter 235A-Nincludes a MFG table 237A-N, respectively. Each MFG table 237A-N may bestructured in a similar fashion to MFG table 225. In one embodiment,each memory controller 240A-N conveys an indication of its memory fetchgranule to agents 210A-N and arbiter 235A-N. This may occur at systemstart-up in one embodiment. In another embodiment, software programs MFGtables 225 and 237A-N based on a knowledge of the memory technology ofmemory devices 250A-N coupled to memory controllers 240A-N.

Referring now to FIG. 3 , a block diagram of one embodiment of a memorysystem 300 is shown. In one embodiment, system 300 includes memorycontroller 305 and a plurality of memory circuits 330A-N. Memorycircuits 330A-N are representative of any number of memory circuits,with each memory circuit having any number of bank groups 340A-N. Memorycircuits 330A-N may be implemented using any of various memorytechnologies. Memory circuits 330A-N may need to be refreshedperiodically if implemented as dynamic random-access memory (DRAM). Eachbank group 340A-N includes any number of memory banks, with the numberof banks per bank group varying according to the memory devicetechnology. For example, in one embodiment, if memory circuits 330A-Nare implemented using double data rate 5 synchronous DRAM (DDR5 SDRAM),then there would be four banks per bank group 340A-N. Other memorydevice technology may be structured with other numbers of banks per bankgroups 340A-N. Memory controller 305 communicates with memory circuits330A-N via bus 325.

In one embodiment, memory controller 305 includes at least queue circuit310, arbitration circuit 315, and optional hashing circuit 320. Queuecircuit 310 stores requests received by memory controller 305 fromvarious agents. Queue circuit 310 may include any number of separatequeues. Arbitration circuit 315 selects which requests are to be sent toaccess certain memory bank groups 340A-N. Arbitration circuit 315 mayattempt to spread out access requests to different banks of memorycircuits 330A-N as there may be a delay between consecutive accesses todifferent pages of the same bank. In one embodiment, arbitration circuit315 uses quality of service (QoS) information, memory fetch granule(MFG), and other information to determine which requests to send tomemory circuits 330A-N. The MFG may be calculated for memory circuits330A-N based on the size of banks in bank groups 340A-N and on theamount of data that is capable of being accessed in a parallel fashionin a single access cycle. Optional hashing circuit 320 generates hashesof the addresses of requests to attempt to spread sequential physicaladdresses to multiple different memory circuits 330A-N to increase theefficiency of accesses to memory circuits 330A-N. In some embodiments,hashing circuit 320 is included within memory controller 305 while inother embodiments, hashing circuit 320 is omitted from memory controller305. Alternatively, hashing circuit 320 is included within memorycontroller 305 but can be disabled for a particular mode. Also, itshould be appreciated that memory controller 305 may include othercircuits which are not shown in FIG. 3 .

Turning now to FIG. 4 , a block diagram of one embodiment of an arbiter400 is shown. In one embodiment, arbiter 400 includes at least controlunit 410, queues 420A-N, and picker 430. It is noted that arbiter 400may include other components which are not shown to avoid obscuring thefigure. In one embodiment, control unit 410 enqueues received memoryrequests in queues 420A-N, which are representative of any number ofqueues. Picker 430 selects requests from queues 420A-N for forwarding tothe next hop on the path to memory. Picker 430 may use any suitablearbitration scheme for determining which request has a higher priorityfor forwarding to memory than the other requests. It is noted thatarbiter 400 can include similar components for the response path flowingfrom memory back to the agents. In one embodiment, arbiter 400 islocated in a communication fabric (e.g., fabric 110 of FIG. 1 ). Inother embodiments, arbiter 400 may be located at other points within acomputing system (e.g., system 200 of FIG. 1 ).

Queues 420A-N may include any organization of queues, with differentqueues storing different types of requests based on various criteria.For example, queues 420A-N may include separate queues for reads andwrites, separate queues for different types of traffic. For example, inone embodiment, traffic can be divided into the three traffic classes ofreal-time (RT), low-latency (LLT), and best-effort (BULK), each havingits own queue 420A-N. These three traffic classes can be implemented asthree virtual channels (VC). When control unit 410 receives a requestfrom a given agent, control unit 410 determines which queue to store therequest in based on the organization of queues 420A-N and the specificcharacteristics of the request.

As shown, queue 420A stores requests 425A-D, which are representative ofa memory fetch granule worth of requests. In this case, the size of thememory fetch granule (MFG) is equal to the data targeted by requests425A-D. It is assumed for the purposes of this discussion that a RTagent generated requests 425A-D which are traversing arbiter 400 on thepath to memory. When the RT agent generates requests 425A-C, theserequests do not include an end of MFG (EoM) flag as shown with the “0”bits in the EoM field 427 of their entries in queue 420A. Since request425D is the last request of the MFG, the RT agent tags request 425D withthe EoM flag. This is denoted with the “1” bit in the EoM field 427 ofthe entry in queue 420A for request 425D. In one embodiment, when picker430 selects request 425A for forwarding, picker 430 will continue toselect requests 425B-D as credits become available until the EoM flag isdetected for request 425D. At that point, picker 430 is free to selectanother request for forwarding on the path to memory.

Turning now to FIG. 5 , a generalized flow diagram of one embodiment ofa method 500 for implementing a memory fetch granule is shown. Forpurposes of discussion, the steps in this embodiment (as well as forFIGS. 6-9 ) are shown in sequential order. However, in other embodimentssome steps may occur in a different order than shown, some steps may beperformed concurrently, some steps may be combined with other steps, andsome steps may be absent.

A memory controller defines a memory fetch granule for a plurality ofreal-time agents to use when forwarding requests for accessing memorydevices coupled to the memory controller (block 505). In one embodiment,the memory fetch granule is calculated based on the memory technology ofthe memory devices coupled to the memory controller. Specifically, inone embodiment, the parallel accessibility of the memory devicedetermines the memory fetch granule. For example, if a given memorydevice coupled to the memory controller has four memory bank groups eachof which can be accessed at a granularity of 256 bytes per access, thenthe memory fetch granule would be defined as 1 kilobyte (KB) which isequal to the access granularity (256 bytes) multiplied by the number ofmemory bank groups.

The memory controller conveys an indication of the memory fetch granuleto the plurality of real-time agents (block 510). For example,continuing with the above scenario, the memory controller would conveyan indication that the size of the memory fetch granule is 1 KB to theplurality of real-time agents. In some embodiments, the memorycontroller also conveys the indication of the size of the memory fetchgranule to arbiters and/or other components traversed by the requests onthe way to memory.

In response to receiving the indication of the memory fetch granule,each real-time agent stores the indication of the size of the memoryfetch granule in a MFG table (block 515). For example, a first real-timeagent stores the indication of the size of the memory fetch granule in afirst MFG table local to the first real-time agent, a second real-timeagent stores the indication of the size of the memory fetch granule in asecond MFG table local to the second real-time agent, and so on. Inother embodiments, the indication of the size of the memory fetchgranule can be stored in other locations.

Also, after receiving the indication of the memory fetch granule, eachreal-time agent accumulates memory requests targeting memory coupled tothe memory controller until a sum of the data referenced by the memoryrequests is equal to a size of the memory fetch granule (block 520). Oneexample of a method for implementing block 520 is described in furtherdetail below in the discussion of method 600 (of FIG. 6 ). Next, eachreal-time agent sends the memory requests to the memory controller basedat least in part on the requests reaching the size of the memory fetchgranule (block 525). Also, the last memory request in the memory fetchgranule is tagged with an end of memory fetch granule flag (block 530).After block 530, method 500 ends.

Referring now to FIG. 6 , one embodiment of a method 600 for a real-timeagent operating under the constraints of a memory fetch granule isshown. A real-time agent generates or receives a new memory requesttargeting memory of a given memory controller (block 605). The real-timeagent determines or retrieves a size of the memory fetch granulecorresponding to the given memory controller (block 610). In oneembodiment, the memory fetch granule is stored in a MFG table (e.g., MFGtable 225 of FIG. 2 ). In other embodiments, the memory fetch granule isstored in other locations or determined in other suitable manners.

Next, if the amount of data referenced by the new memory request isgreater than or equal to the memory fetch granule (MFG) (conditionalblock 615, “no” leg), then the real-time agent sends the new memoryrequest to the given memory controller (block 620). After block 620,method 600 ends. Otherwise, if the amount of data referenced by the newmemory request is less than the memory fetch granule (MFG) (conditionalblock 615, “yes” leg), then the real-time agent waits for a subsequentmemory request to be generated or received (block 625). If a thresholdamount of time elapses prior to the subsequent memory request beinggenerated or received (conditional block 630, “yes” leg), then thereal-time agent sends the new memory request to the memory controller(block 620). The threshold amount of time may vary from embodiment toembodiment, and the threshold amount of time may be programmable bysoftware.

If the subsequent memory request is generated or received prior to thethreshold amount of time elapsing (conditional block 630, “no” leg),then if the subsequent memory request is not to a sequential address ofthe address referenced by the new memory request (conditional block 635,“no” leg), then the real-time agent sends the new memory request to thememory controller (block 620). After block 620, method 600 ends.

Otherwise, if the subsequent memory request is to a sequential addressof the address referenced by the new memory request (conditional block635, “yes” leg), then the real-time agent determines if the combinedamount of data referenced by the pending memory requests (e.g., the newmemory request and the subsequent memory request) is greater than orequal to the memory fetch granule (conditional block 640). If thecombined amount of data referenced by the pending memory requests isgreater than or equal to the size of the memory fetch granule(conditional block 640, “yes” leg), then the real-time agent sends thepending memory requests to the memory controller (block 645). If thecombined amount of data referenced by the pending memory requests isgreater than or equal to the size of the memory fetch granule(conditional block 640, “no” leg), then method 600 returns to block 625with the real-time agent waiting for a subsequent memory request to begenerated or received.

Turning now to FIG. 7 , one embodiment of a method 700 for managingmemory fetch granules for multiple memory controllers is shown. Areal-time agent generates or receives a memory request targeting a givenmemory controller (block 705). In another embodiment, the agent is notclassified as a real-time agent in that the agent generates or forwardsa variety of different types of traffic, but the memory request isclassified as a real-time request in this example. It is assumed for thepurposes of this discussion that the system includes a plurality ofmemory controllers, with each memory controller coupled to one or morememory devices. Also, the system may also include a plurality ofreal-time agents, although only a single real-time agent is described inthe context of the discussion of method 700. However, it should beunderstood that the other real-time agents in the system can behave in asimilar manner to the real-time agent described in the discussion ofmethod 700.

Next, the real-time agent determines the memory fetch granule associatedwith the given memory controller (block 710). In one embodiment, thereal-time agent queries a MFG table to determine the memory fetchgranule associated with the given memory controller. In otherembodiments, the real-time agent uses other techniques to determine thememory fetch granule associated with the given memory controller. If thememory fetch granule is a first size (conditional block 715, “firstsize” leg), then the real-time agent attempts to accumulate requests forthe given memory controller until the amount of data referenced by therequests reaches the first size (block 720). After block 720, method 700ends. If the memory fetch granule is a second size (conditional block715, “second size” leg), then the real-time agent attempts to accumulaterequests for the given memory controller until the amount of datareferenced by the requests reaches the second size, with the second sizebeing different from the first size (block 725). After block 725, method700 ends. If the memory fetch granule is another size (conditional block715, “other size” leg), then the real-time agent attempts to accumulaterequests for the given memory controller until the amount of datareferenced by the requests reaches the specific size corresponding tothe particular memory controller (block 730). After block 730, method700 ends.

Turning now to FIG. 8 , one embodiment of a method 800 for an arbiterprocessing requests grouped into memory fetch granule sized chunks isshown. An arbiter receives a credit for forwarding a request on a nexthop to memory (block 805). The arbiter may be located in any suitablelocation, such as within a communication fabric, within a memorycontroller, or in any other position within a system or apparatus. Thearbiter determines which request to select from a plurality of enqueuedrequests (block 810). If the arbiter selects a RT request (conditionalblock 815, “yes” leg), then the arbiter checks the EoM flag to determineif this request is the last request of a MFG (conditional block 820). Inone embodiment, the arbiter only selects a RT request if the entire MFGcontaining the RT request has already been received by the arbiter. Ifthe arbiter selects a non-RT request (conditional block 815, “no” leg),then the arbiter forwards the request on the next hop to memory (block825), and then method 800 ends.

If the RT request selected by the arbiter has the EoM flag set(conditional block 820, “yes” leg), then the arbiter forwards therequest on the next hop to memory (block 825), and then method 800 ends.If the RT request selected by the arbiter does not have its EoM flag set(conditional block 820, “no” leg), then the arbiter forwards the requeston the next hop to memory and continues selecting other requests of thesame MFG as credits become available until the entire MFG has beenforwarded on the path to memory (block 830). In other words, in block830, the arbiter waits to select other requests for forwarding until theentire MFG has been forwarded on the path to memory. After block 830,method 800 ends.

Referring now to FIG. 9 , one embodiment of a method 900 for an agentenqueuing or overriding enqueuing of requests is shown. A control unit(e.g., control unit 215 of FIG. 2 ) of an agent (e.g., agent 210A)detects a memory request targeting a given memory controller (block905). In response to detecting the memory request targeting the givenmemory controller, the control unit determines whether to enqueue thememory request or override the enqueuing of the memory request (block910). Depending on the embodiment, the control unit uses any of variousdifferent techniques for determining whether to enqueue the memoryrequest or override the queuing of the memory request. In oneembodiment, the control unit uses the value of an end of MFG flagassociated with the memory request to determine whether to queue thememory request or override the queuing of memory requests.

If an end of MFG flag associated with the memory request is not set(conditional block 915, “no” leg), then the control unit enqueues thememory request and waits to send the memory request until acorresponding MFG size threshold is satisfied (block 920). After block920, method 900 ends. In one embodiment, the corresponding MFG sizethreshold is satisfied when the sum of sizes of data referenced byenqueued memory requests equals the size of the corresponding MFG sizefor the given memory controller. In one embodiment, the correspondingMFG size is a multiple of the memory request size. In anotherembodiment, the corresponding MFG size threshold is satisfied when thesum of sizes of data referenced by enqueued memory requests is closestto the corresponding MFG size without going over. In other embodiments,other ways of satisfying the corresponding MFG size threshold arepossible and are contemplated.

If an end of MFG flag associated with the memory request is set(conditional block 915, “yes” leg), then the control unit overrides theenqueuing mechanism and immediately sends the memory request to thegiven memory controller (block 925). The control unit may also sendother enqueued requests with the memory request in some cases.Alternatively, if there are no other enqueued requests, then the controlunit sends the memory request by itself. After block 925, method 900ends.

Referring now to FIG. 10 , a block diagram of one embodiment of a system1000 is shown that may incorporate and/or otherwise utilize the methodsand mechanisms described herein. In the illustrated embodiment, thesystem 1000 includes at least a portion of SOC 100 (of FIG. 1 ) whichmay include multiple types of processing units, such as a centralprocessing unit (CPU), a graphics processing unit (GPU), or otherwise, acommunication fabric, and interfaces to memories and input/outputdevices. In various embodiments, SoC 100 is coupled to external memory1002, peripherals 1004, and power supply 1008.

A power supply 1008 is also provided which supplies the supply voltagesto SoC 100 as well as one or more supply voltages to the memory 1002and/or the peripherals 1004. In various embodiments, power supply 1008represents a battery (e.g., a rechargeable battery in a smart phone,laptop or tablet computer, or other device). In some embodiments, morethan one instance of SoC 100 is included (and more than one externalmemory 1002 may be included as well).

The memory 1002 is any type of memory, such as dynamic random accessmemory (DRAM), synchronous DRAM (SDRAM), double data rate (DDR, DDR2,DDR3, etc.) SDRAM (including mobile versions of the SDRAMs such asmDDR3, etc., and/or low power versions of the SDRAMs such as LPDDR2,etc.), RAMBUS DRAM (RDRAM), static RAM (SRAM), etc. One or more memorydevices may be coupled onto a circuit board to form memory modules suchas single inline memory modules (SIMMs), dual inline memory modules(DIMMs), etc. Alternatively, the devices are mounted with a SoC or anintegrated circuit in a chip-on-chip configuration, a package-on-packageconfiguration, or a multi-chip module configuration.

The peripherals 1004 include any desired circuitry, depending on thetype of system 1000. For example, in one embodiment, peripherals 1004includes devices for various types of wireless communication, such aswifi, Bluetooth, cellular, global positioning system, etc. In someembodiments, the peripherals 1004 also include additional storage,including RAM storage, solid state storage, or disk storage. Theperipherals 1004 include user interface devices such as a displayscreen, including touch display screens or multitouch display screens,keyboard or other input devices, microphones, speakers, etc.

As illustrated, system 1000 is shown to have application in a wide rangeof areas. For example, system 1000 may be utilized as part of the chips,circuitry, components, etc., of a desktop computer 1010, laptop computer1020, tablet computer 1030, cellular or mobile phone 1040, or television1050 (or set-top box coupled to a television). Also illustrated is asmartwatch and health monitoring device 1060. In some embodiments,smartwatch may include a variety of general-purpose computing relatedfunctions. For example, smartwatch may provide access to email,cellphone service, a user calendar, and so on. In various embodiments, ahealth monitoring device may be a dedicated medical device or otherwiseinclude dedicated health related functionality. For example, a healthmonitoring device may monitor a user's vital signs, track proximity of auser to other users for the purpose of epidemiological socialdistancing, contact tracing, provide communication to an emergencyservice in the event of a health crisis, and so on. In variousembodiments, the above-mentioned smartwatch may or may not include someor any health monitoring related functions. Other wearable devices arecontemplated as well, such as devices worn around the neck, devices thatare implantable in the human body, glasses designed to provide anaugmented and/or virtual reality experience, and so on.

System 1000 may further be used as part of a cloud-based service(s)1070. For example, the previously mentioned devices, and/or otherdevices, may access computing resources in the cloud (i.e., remotelylocated hardware and/or software resources). Still further, system 1000may be utilized in one or more devices of a home 1080 other than thosepreviously mentioned. For example, appliances within the home 1080 maymonitor and detect conditions that warrant attention. For example,various devices within the home 1080 (e.g., a refrigerator, a coolingsystem, etc.) may monitor the status of the device and provide an alertto the homeowner (or, for example, a repair facility) should aparticular event be detected. Alternatively, a thermostat may monitorthe temperature in the home 1080 and may automate adjustments to aheating/cooling system based on a history of responses to variousconditions by the homeowner. Also illustrated in FIG. 10 is theapplication of system 1000 to various modes of transportation 1090. Forexample, system 1000 may be used in the control and/or entertainmentsystems of aircraft, trains, buses, cars for hire, private automobiles,waterborne vessels from private boats to cruise liners, scooters (forrent or owned), and so on. In various cases, system 1000 may be used toprovide automated guidance (e.g., self-driving vehicles), generalsystems control, and otherwise. These and many other embodiments arepossible and are contemplated. It is noted that the devices andapplications illustrated in FIG. 10 are illustrative only and are notintended to be limiting. Other devices are possible and arecontemplated.

The present disclosure includes references to “an “embodiment” or groupsof “embodiments” (e.g., “some embodiments” or “various embodiments”).Embodiments are different implementations or instances of the disclosedconcepts. References to “an embodiment,” “one embodiment,” “a particularembodiment,” and the like do not necessarily refer to the sameembodiment. A large number of possible embodiments are contemplated,including those specifically disclosed, as well as modifications oralternatives that fall within the spirit or scope of the disclosure.

This disclosure may discuss potential advantages that may arise from thedisclosed embodiments. Not all implementations of these embodiments willnecessarily manifest any or all of the potential advantages. Whether anadvantage is realized for a particular implementation depends on manyfactors, some of which are outside the scope of this disclosure. Infact, there are a number of reasons why an implementation that fallswithin the scope of the claims might not exhibit some or all of anydisclosed advantages. For example, a particular implementation mightinclude other circuitry outside the scope of the disclosure that, inconjunction with one of the disclosed embodiments, negates or diminishesone or more the disclosed advantages. Furthermore, suboptimal designexecution of a particular implementation (e.g., implementationtechniques or tools) could also negate or diminish disclosed advantages.Even assuming a skilled implementation, realization of advantages maystill depend upon other factors such as the environmental circumstancesin which the implementation is deployed. For example, inputs supplied toa particular implementation may prevent one or more problems addressedin this disclosure from arising on a particular occasion, with theresult that the benefit of its solution may not be realized. Given theexistence of possible factors external to this disclosure, it isexpressly intended that any potential advantages described herein arenot to be construed as claim limitations that must be met to demonstrateinfringement. Rather, identification of such potential advantages isintended to illustrate the type(s) of improvement available to designershaving the benefit of this disclosure. That such advantages aredescribed permissively (e.g., stating that a particular advantage “mayarise”) is not intended to convey doubt about whether such advantagescan in fact be realized, but rather to recognize the technical realitythat realization of such advantages often depends on additional factors.

Unless stated otherwise, embodiments are non-limiting. That is, thedisclosed embodiments are not intended to limit the scope of claims thatare drafted based on this disclosure, even where only a single exampleis described with respect to a particular feature. The disclosedembodiments are intended to be illustrative rather than restrictive,absent any statements in the disclosure to the contrary. The applicationis thus intended to permit claims covering disclosed embodiments, aswell as such alternatives, modifications, and equivalents that would beapparent to a person skilled in the art having the benefit of thisdisclosure.

For example, features in this application may be combined in anysuitable manner. Accordingly, new claims may be formulated duringprosecution of this application (or an application claiming prioritythereto) to any such combination of features. In particular, withreference to the appended claims, features from dependent claims may becombined with those of other dependent claims where appropriate,including claims that depend from other independent claims. Similarly,features from respective independent claims may be combined whereappropriate.

Accordingly, while the appended dependent claims may be drafted suchthat each depends on a single other claim, additional dependencies arealso contemplated. Any combinations of features in the dependent thatare consistent with this disclosure are contemplated and may be claimedin this or another application. In short, combinations are not limitedto those specifically enumerated in the appended claims.

Where appropriate, it is also contemplated that claims drafted in oneformat or statutory type (e.g., apparatus) are intended to supportcorresponding claims of another format or statutory type (e.g., method).

Because this disclosure is a legal document, various terms and phrasesmay be subject to administrative and judicial interpretation. Publicnotice is hereby given that the following paragraphs, as well asdefinitions provided throughout the disclosure, are to be used indetermining how to interpret claims that are drafted based on thisdisclosure.

References to a singular form of an item (i.e., a noun or noun phrasepreceded by “a,” “an,” or “the”) are, unless context clearly dictatesotherwise, intended to mean “one or more.” Reference to “an item” in aclaim thus does not, without accompanying context, preclude additionalinstances of the item. A “plurality” of items refers to a set of two ormore of the items.

The word “may” is used herein in a permissive sense (i.e., having thepotential to, being able to) and not in a mandatory sense (i.e., must).

The terms “comprising” and “including,” and forms thereof, areopen-ended and mean “including, but not limited to.”

When the term “or” is used in this disclosure with respect to a list ofoptions, it will generally be understood to be used in the inclusivesense unless the context provides otherwise. Thus, a recitation of “x ory” is equivalent to “x or y, or both,” and thus covers 1) x but not y,2) y but not x, and 3) both x and y. On the other hand, a phrase such as“either x or y, but not both” makes clear that “or” is being used in theexclusive sense.

A recitation of “w, x, y, or z, or any combination thereof” or “at leastone of . . . w, x, y, and z” is intended to cover all possibilitiesinvolving a single element up to the total number of elements in theset. For example, given the set [w, x, y, z], these phrasings cover anysingle element of the set (e.g., w but not x, y, or z), any two elements(e.g., w and x, but not y or z), any three elements (e.g., w, x, and y,but not z), and all four elements. The phrase “at least one of . . . w,x, y, and z” thus refers to at least one element of the set [w, x, y,z], thereby covering all possible combinations in this list of elements.This phrase is not to be interpreted to require that there is at leastone instance of w, at least one instance of x, at least one instance ofy, and at least one instance of z.

Various “labels” may precede nouns or noun phrases in this disclosure.Unless context provides otherwise, different labels used for a feature(e.g., “first circuit,” “second circuit,” “particular circuit,” “givencircuit,” etc.) refer to different instances of the feature.Additionally, the labels “first,” “second,” and “third” when applied toa feature do not imply any type of ordering (e.g., spatial, temporal,logical, etc.), unless stated otherwise.

The phrase “based on” or is used to describe one or more factors thataffect a determination. This term does not foreclose the possibilitythat additional factors may affect the determination. That is, adetermination may be solely based on specified factors or based on thespecified factors as well as other, unspecified factors. Consider thephrase “determine A based on B.” This phrase specifies that B is afactor that is used to determine A or that affects the determination ofA. This phrase does not foreclose that the determination of A may alsobe based on some other factor, such as C. This phrase is also intendedto cover an embodiment in which A is determined based solely on B. Asused herein, the phrase “based on” is synonymous with the phrase “basedat least in part on.”

The phrases “in response to” and “responsive to” describe one or morefactors that trigger an effect. This phrase does not foreclose thepossibility that additional factors may affect or otherwise trigger theeffect, either jointly with the specified factors or independent fromthe specified factors. That is, an effect may be solely in response tothose factors, or may be in response to the specified factors as well asother, unspecified factors. Consider the phrase “perform A in responseto B.” This phrase specifies that B is a factor that triggers theperformance of A, or that triggers a particular result for A. Thisphrase does not foreclose that performing A may also be in response tosome other factor, such as C. This phrase also does not foreclose thatperforming A may be jointly in response to B and C. This phrase is alsointended to cover an embodiment in which A is performed solely inresponse to B. As used herein, the phrase “responsive to” is synonymouswith the phrase “responsive at least in part to.” Similarly, the phrase“in response to” is synonymous with the phrase “at least in part inresponse to.”

Within this disclosure, different entities (which may variously bereferred to as “units,” “circuits,” other components, etc.) may bedescribed or claimed as “configured” to perform one or more tasks oroperations. This formulation—[entity] configured to [perform one or moretasks]—is used herein to refer to structure (i.e., something physical).More specifically, this formulation is used to indicate that thisstructure is arranged to perform the one or more tasks during operation.A structure can be said to be “configured to” perform some task even ifthe structure is not currently being operated. Thus, an entity describedor recited as being “configured to” perform some task refers tosomething physical, such as a device, circuit, a system having aprocessor unit and a memory storing program instructions executable toimplement the task, etc. This phrase is not used herein to refer tosomething intangible.

In some cases, various units/circuits/components may be described hereinas performing a set of task or operations. It is understood that thoseentities are “configured to” perform those tasks/operations, even if notspecifically noted.

The term “configured to” is not intended to mean “configurable to.” Anunprogrammed FPGA, for example, would not be considered to be“configured to” perform a particular function. This unprogrammed FPGAmay be “configurable to” perform that function, however. Afterappropriate programming, the FPGA may then be said to be “configured to”perform the particular function.

For purposes of United States patent applications based on thisdisclosure, reciting in a claim that a structure is “configured to”perform one or more tasks is expressly intended not to invoke 35 U.S.C.§ 112(f) for that claim element. Should Applicant wish to invoke Section112(f) during prosecution of a United States patent application based onthis disclosure, it will recite claim elements using the “means for”[performing a function] construct.

Different “circuits” may be described in this disclosure. These circuitsor “circuitry” constitute hardware that includes various types ofcircuit elements, such as combinatorial logic, clocked storage devices(e.g., flip-flops, registers, latches, etc.), finite state machines,memory (e.g., random-access memory, embedded dynamic random-accessmemory), programmable logic arrays, and so on. Circuitry may be customdesigned, or taken from standard libraries. In various implementations,circuitry can, as appropriate, include digital components, analogcomponents, or a combination of both. Certain types of circuits may becommonly referred to as “units” (e.g., a decode unit, an arithmeticlogic unit (ALU), functional unit, memory management unit (MMU), etc.).Such units also refer to circuits or circuitry.

The disclosed circuits/units/components and other elements illustratedin the drawings and described herein thus include hardware elements suchas those described in the preceding paragraph. In many instances, theinternal arrangement of hardware elements within a particular circuitmay be specified by describing the function of that circuit. Forexample, a particular “decode unit” may be described as performing thefunction of “processing an opcode of an instruction and routing thatinstruction to one or more of a plurality of functional units,” whichmeans that the decode unit is “configured to” perform this function.This specification of function is sufficient, to those skilled in thecomputer arts, to connote a set of possible structures for the circuit.

In various embodiments, as discussed in the preceding paragraph,circuits, units, and other elements defined by the functions oroperations that they are configured to implement, The arrangement andsuch circuits/units/components with respect to each other and the mannerin which they interact form a microarchitectural definition of thehardware that is ultimately manufactured in an integrated circuit orprogrammed into an FPGA to form a physical implementation of themicroarchitectural definition. Thus, the microarchitectural definitionis recognized by those of skill in the art as structure from which manyphysical implementations may be derived, all of which fall into thebroader structure described by the microarchitectural definition. Thatis, a skilled artisan presented with the microarchitectural definitionsupplied in accordance with this disclosure may, without undueexperimentation and with the application of ordinary skill, implementthe structure by coding the description of the circuits/units/componentsin a hardware description language (HDL) such as Verilog or VHDL. TheHDL description is often expressed in a fashion that may appear to befunctional. But to those of skill in the art in this field, this HDLdescription is the manner that is used transform the structure of acircuit, unit, or component to the next level of implementationaldetail. Such an HDL description may take the form of behavioral code(which is typically not synthesizable), register transfer language (RTL)code (which, in contrast to behavioral code, is typicallysynthesizable), or structural code (e.g., a netlist specifying logicgates and their connectivity). The HDL description may subsequently besynthesized against a library of cells designed for a given integratedcircuit fabrication technology, and may be modified for timing, power,and other reasons to result in a final design database that istransmitted to a foundry to generate masks and ultimately produce theintegrated circuit. Some hardware circuits or portions thereof may alsobe custom-designed in a schematic editor and captured into theintegrated circuit design along with synthesized circuitry. Theintegrated circuits may include transistors and other circuit elements(e.g. passive elements such as capacitors, resistors, inductors, etc.)and interconnect between the transistors and circuit elements. Someembodiments may implement multiple integrated circuits coupled togetherto implement the hardware circuits, and/or discrete elements may be usedin some embodiments. Alternatively, the HDL design may be synthesized toa programmable logic array such as a field programmable gate array(FPGA) and may be implemented in the FPGA. This decoupling between thedesign of a group of circuits and the subsequent low-levelimplementation of these circuits commonly results in the scenario inwhich the circuit or logic designer never specifies a particular set ofstructures for the low-level implementation beyond a description of whatthe circuit is configured to do, as this process is performed at adifferent stage of the circuit implementation process.

The fact that many different low-level combinations of circuit elementsmay be used to implement the same specification of a circuit results ina large number of equivalent structures for that circuit. As noted,these low-level circuit implementations may vary according to changes inthe fabrication technology, the foundry selected to manufacture theintegrated circuit, the library of cells provided for a particularproject, etc. In many cases, the choices made by different design toolsor methodologies to produce these different implementations may bearbitrary.

Moreover, it is common for a single implementation of a particularfunctional specification of a circuit to include, for a givenembodiment, a large number of devices (e.g., millions of transistors).Accordingly, the sheer volume of this information makes it impracticalto provide a full recitation of the low-level structure used toimplement a single embodiment, let alone the vast array of equivalentpossible implementations. For this reason, the present disclosuredescribes structure of circuits using the functional shorthand commonlyemployed in the industry.

Various units, circuits, or other components may be described as“configured to” perform a task or tasks. In such contexts, “configuredto” is a broad recitation of structure generally meaning “havingcircuitry that” performs the task or tasks during operation. As such,the unit/circuit/component can be configured to perform the task evenwhen the unit/circuit/component is not currently on. In general, thecircuitry that forms the structure corresponding to “configured to” mayinclude hardware circuits. Similarly, various units/circuits/componentsmay be described as performing a task or tasks, for convenience in thedescription. Such descriptions should be interpreted as including thephrase “configured to.” Reciting a unit/circuit/component that isconfigured to perform one or more tasks is expressly intended not toinvoke 35 U.S.C. § 112(f) for that unit/circuit/component.

In the following description, numerous specific details are set forth toprovide a thorough understanding of the embodiments described in thisdisclosure. However, one having ordinary skill in the art shouldrecognize that the embodiments might be practiced without these specificdetails. In some instances, well-known circuits, structures, andtechniques have not been shown in detail for ease of illustration and toavoid obscuring the description of the embodiments.

It should be emphasized that the above-described embodiments are onlynon-limiting examples of implementations. Numerous variations andmodifications will become apparent to those skilled in the art once theabove disclosure is fully appreciated. It is intended that the followingclaims be interpreted to embrace all such variations and modifications.

What is claimed is:
 1. An apparatus comprising: one or more agents; afirst memory controller configured to: define a first memory fetchgranule for the one or more agents to use when forwarding real-timememory requests which target first memory coupled to the first memorycontroller, wherein the first memory fetch granule specifies an accessgranularity for the first memory; convey an indication of a size of thefirst memory fetch granule to the one or more agents; wherein a firstagent, of the one or more agents, is configured to: receive theindication of the size of the first memory fetch granule; accumulatememory requests having a sequential access pattern which target thefirst memory coupled to the first memory controller until a sum of sizesof data referenced by the memory requests satisfies a first memory fetchgranule size threshold; and send the memory requests to the first memorycontroller based at least in part on the sum of the sizes of datareferenced by the memory requests satisfying the first memory fetchgranule size threshold.
 2. The apparatus as recited in claim 1, whereinthe first agent is further configured to: tag, with an end of memoryfetch granule flag, a last memory request in a group of memory requestsbased on the group of memory requests having a sequential access patternand the sum of sizes of data referenced by the group satisfying thefirst memory fetch granule size threshold; and tag, with the end ofmemory fetch granule flag, a given memory request which does notcorrespond to a sequential access pattern even if the given memoryrequest references an amount of data less than the size of the firstmemory fetch granule.
 3. The apparatus as recited in claim 1, wherein asize of the first memory fetch granule corresponds to interleaving dataacross multiple banks located in different bank groups of memory devicescoupled to the first memory controller.
 4. The apparatus as recited inclaim 1, further comprising a second memory controller which defines asecond memory fetch granule different from the first memory fetchgranule, wherein the second memory controller is configured to send anindication of a size of the second memory fetch granule to the one ormore agents.
 5. The apparatus as recited in claim 4, wherein the firstagent is further configured to accumulate a second group of memoryrequests targeting second memory coupled to the second memory controlleruntil a sum of data referenced by the second group of memory requestssatisfies a second memory fetch granule size threshold.
 6. The apparatusas recited in claim 1, wherein the first agent is further configured tostore the indication of the size of the first memory fetch granule in amemory fetch granule table.
 7. The apparatus as recited in claim 1,further comprising a first arbiter, wherein the first memory controlleris configured to send an indication of the size of the first memoryfetch granule to the first arbiter, and wherein the first arbiter isconfigured to: accumulate real-time requests from the first agenttargeting the first memory coupled to the first memory controller untila sum of sizes of data referenced by the memory requests satisfies thefirst memory fetch granule size threshold; determine if a given memoryrequest, selected as a winner of arbitration, is a last memory requestof a given memory fetch granule; and if the given memory request is notthe last memory request of the given memory fetch granule, forward thegiven memory request of the given memory fetch granule to the firstmemory controller and continue sending other memory requests of thegiven memory fetch granule as credits become available until an entiretyof the given memory fetch granule has been forwarded on the path to thefirst memory controller.
 8. The apparatus as recited in claim 7, whereinafter forwarding the given memory request on the path to the firstmemory controller, the first arbiter is configured to prevent memoryrequests of other memory fetch granules from being eligible forarbitration until the entirety of the given memory fetch granule hasbeen forwarded on the path to the first memory controller.
 9. Theapparatus as recited in claim 1, wherein the first memory fetch granuleis dependent on a dynamic-random access memory (DRAM) technology ofmemory devices coupled to the first memory controller.
 10. A methodcomprising: defining, by a first memory controller, a first memory fetchgranule for one or more agents to use when forwarding real-time memoryrequests which target first memory coupled to the first memorycontroller, wherein the first memory fetch granule specifies an accessgranularity for the first memory; conveying an indication of a size ofthe first memory fetch granule to the one or more agents; receiving, bya first agent of the one or more agents, the indication of the size ofthe first memory fetch granule; accumulating, by the first agent, memoryrequests having a sequential access pattern which target the firstmemory coupled to the first memory controller until a sum of sizes ofdata referenced by the memory requests satisfies a first memory fetchgranule size threshold; and sending, by the first agent, the memoryrequests to the first memory controller based at least in part on thesum of the sizes of data referenced by the memory requests satisfyingthe first memory fetch granule size threshold.
 11. The method as recitedin claim 10, further comprising the first agent tagging, with an end ofmemory fetch granule flag, a last memory request in a group of memoryrequests based on the group of memory requests having a sequentialaccess pattern and the sum of sizes of data referenced by the groupsatisfying the first memory fetch granule size threshold.
 12. The methodas recited in claim 8, further comprising the first agent tagging, withthe end of memory fetch granule flag, a given memory request which doesnot correspond to a sequential access pattern even if the given memoryrequest references an amount of data less than the size of the firstmemory fetch granule.
 13. The method as recited in claim 10, wherein asize of the first memory fetch granule corresponds to interleaving dataacross multiple banks located in different bank groups of memory devicescoupled to the first memory controller.
 14. The method as recited inclaim 10, further comprising a second memory controller sending anindication of a size of the second memory fetch granule to the one ormore agents, wherein the second memory fetch granule is different fromthe first memory fetch granule.
 15. The method as recited in claim 14,further comprising the first agent accumulating a second group of memoryrequests targeting second memory coupled to the second memory controlleruntil a sum of data referenced by the second group of memory requestssatisfies a second memory fetch granule size threshold.
 16. A systemcomprising: one or more agents; a first arbiter; and a first memorycontroller configured to: define a first memory fetch granule for theone or more agents to use when forwarding real-time memory requestswhich target first memory coupled to the first memory controller,wherein the first memory fetch granule specifies an access granularityfor the first memory; convey an indication of a size of the first memoryfetch granule to the one or more agents and the first arbiter; andwherein a first agent, of the one or more agents, is configured to:receive the indication of the size of the first memory fetch granule;accumulate memory requests having a sequential access pattern whichtarget the first memory coupled to the first memory controller until asum of sizes of data referenced by the memory requests satisfies a firstmemory fetch granule size threshold; and send the memory requests to thefirst memory controller based at least in part on the sum of the sizesof data referenced by the memory requests satisfying the first memoryfetch granule size threshold; wherein the first arbiter is configuredto: accumulate real-time requests from the first agent targeting thefirst memory coupled to the first memory controller until a sum of sizesof data referenced by the memory requests satisfies the first memoryfetch granule size threshold; determine if a given memory request,selected as a winner of arbitration, is a last memory request of a givenmemory fetch granule; and if the given memory request is not the lastmemory request of the given memory fetch granule, forward the givenmemory request of the given memory fetch granule to the first memorycontroller and continue sending other memory requests of the givenmemory fetch granule as credits become available until an entirety ofthe given memory fetch granule has been forwarded on the path to thefirst memory controller.
 17. The system as recited in claim 16, whereinafter forwarding the given memory request on the path to the firstmemory controller, the first arbiter is configured to prevent memoryrequests of other memory fetch granules from being eligible forarbitration until the entirety of the given memory fetch granule hasbeen forwarded on the path to the first memory controller.
 18. Thesystem as recited in claim 16, wherein the first memory fetch granule isdependent on a dynamic-random access memory (DRAM) technology of memorydevices coupled to the first memory controller.
 19. The system asrecited in claim 16, wherein the first agent is further configured tostore the indication of the size of the first memory fetch granule in amemory fetch granule table.
 20. The system as recited in claim 19,wherein the first agent is further configured to store, in the memoryfetch granule table, a plurality of indications of sizes of a pluralityof different memory fetch granules corresponding to different memorycontrollers.